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Information mapping is a popular application of Multivoxel Pattern Analysis (MVPA) to fMRI. Information 
maps are constructed using the so called searchlight method, where the spherical multivoxel neighborhood 
of every voxel (i.e., a searchlight) in the brain is evaluated for the presence of task-relevant response patterns. 
Despite their widespread use, information maps present several challenges for interpretation. One such chal- 
lenge has to do with inferring the size and shape of a multivoxel pattern from its signature on the information 
map. To address this issue, we formally examined the geometric basis of this mapping relationship. Based 
on geometric considerations, we show how and why small patterns (i.e., having smaller spatial extents) can 
produce a larger signature on the information map as compared to large patterns, independent of the size 
of the searchlight radius. Furthermore, we show that the number of informative searchlights over the brain 
increase as a function of searchlight radius, even in the complete absence of any multivariate response pat- 
terns. These properties are unrelated to the statistical capabilities of the pattern-analysis algorithms used but 
are obligatory geometric properties arising from using the searchlight procedure. 
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1. Introduction 

In FMRI, a functional map is an important represen- 
tation of how cognitive function is related to neu- 
roanatomy. Such maps provide a topographic rep- 
resentation of the brain regions that are (and are 
not) systematically responsive to differing values of 
a cognitive variable. The size, shape, number and 
location of the "blobs" (i.e., voxel-clusters meeting 
some statistical relevance criterion) on the functional 
maps are the basis for inferences about the neural 
substrates of the cognitive process. Given the impor- 
tance of functional maps, there is a continuing need 
to scrutinize the sensitivity, precision and technical 
assumptions of the mapping procedure itself. The 
topic of the current technical note is the mapping 
procedure used to generate the widely used informa- 
tion maps ( [Kriegeskorte et al.] 2006[ ). 

The motivation for information mapping is the sta- 
tistical concern that a region's responses to the cog- 
nitive variable under study might take a complex 



multivariate form. For example, a group of mul- 
tiple voxels might conjointly respond in a task- 
relevant manner even though individual voxels may 



not detectably do so (Haxby et al. 2001 


Cox and 


SavoyJ |2003} |HaynesJ |2006f |Norman et 


al. 2006: 


Mur et al. 2008f |Tong| |2010f | Wagner and Rissman, 


2010: Formisano and Kriegeskorte, 2012 


Serences 


and SaprooJ |2012|). Such distributed response pat- 



terns might be effectively undetectable with conven- 
tional univariate statistical tests restricted to indi- 
vidual voxel responses, but detectable with an ex- 
plicit multivariate test for multivoxel response pat- 
terns, i.e., using some Multivoxel Pattern Analysis 
(MVPA) technique. To address this concern in the 
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context of functional mapping, Kriegeskorte et al. 
(2006) proposed a simple procedure to enable so- 
phisticated MVPA methods to be readily applied to 
detect and map brain regions that contain informa- 
tion about the experimental conditions, irrespective 
of whether the informative responses are univariate 
or multivariate. 

In the proposed procedure, the unit of evaluation is 
not the single voxel but a "searchlight" - the group of 
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voxels contained in a spherical neighborhood of ra- 
dius r around a single voxel. The searchlight statis- 
tic is a measure of whether the conjoint responses 
of this group of voxels contain information about 
the experimental conditions being tested. Based 
on these abstractions, an information map is gen- 
erated as follows: the searchlight statistic is evalu- 
ated for searchlights centered at every voxel in the 
brain; and the statistic^ value for each searchlight 
is mapped to the central voxel of that searchlight. 
The resulting topographic representation generated 
by the searchlight-procedure has been referred to as 
the information map. Such searchlight-based infor- 
mation maps are now routinely reported in studies 



that employ MVPA methods ( for example, |Haynes 
eTall [25071 |Soon et al.[ [2008} IJohnson et al.[ 12009) 
Poldrack et all |2Q09l [Chadwick et all [20101 [Posted 



hof et aU|20T0l|Nestor et aU|20m|Almk et aU|20lTj 
Golomb and Kanwisher[|2011)|Peelen and Kastner) 
2011) IStokes et al.| [20lT) jWoolgar et al) |20TT) |Mor^ 



gan et al.||2011[|O osterhof e t al.[|2012[|Connolly et~ 
2012[|Kaplan and Meyer||2012] > 



Notwithstanding their popularity, interpreting a 
searchlight-based information map presents a va- 
riety of challen ges (|Kriegeskorte et al. 2006) |Pol- 



drack et alH2009HPereira and BotvinickH2011HJimura 



and Poldrack 2012[ ). One such challenge is posed 



by the topographic ambiguity of the information 
map. Recall that a searchlight statistic computed on 
the responses of an entire multivoxel searchlight is 
mapped to a single voxel on the information map, 
namely, that searchlight's central voxel. This map- 
ping protocol is applied to searchlights across the 
brain irrespective of the number or the spatial loca- 
tions of the information-carrying voxels within each 
searchlight. Consequently, the spatial position of 
an informative voxel on the information maps is a 
coarse index to the actual location of the informative 
"pattern" within that voxel's searchlight. Further- 
more, since a searchlight has a unique central voxel 
v, an informative voxel on the information map is 
not indicative of the actual number of voxels con- 
stituting the informative pattern within that voxel's 
searchlight neighborhood. Given these properties of 
the information map, we asked: what, if anything, 
can be reliably inferred about the size and shape of a 
multivoxel pattern from its corresponding signature 
on the information map? 

Previous studies have treated this question as a 



qualitative concern requiring cautious interpreta- 
tion. Nonetheless, here we show that information 
maps are in fact subject to several crisply quantita- 
tive geometric constraints that strongly govern how 
such maps can be interpreted. 

Our analytical results are based on a simple geomet- 
ric intuition. Since a multivoxel searchlight is de- 
fined at every voxel across the brain, searchlights 
centered at different voxels systematically overlap 
each other, i.e., have voxels in common. Using over- 
lapping searchlights is crucial to obtain a continuous 
topographic coverage especially when the locations 
and spatial extents of voxel-neighborhoods that are 
task-responsive are unknown a priori. We observed 
that due to these overlaps, multiple searchlights 
would be deemed informative merely by virtue of 
sharing the same task-relevant multivoxel response 
patterns. Thus we reasoned that the size and shape 
of a multivoxel group G's signature on the informa- 
tion map should be defined by exactly those vox- 
els which have searchlight-neighborhoods that con- 
tain G. Using this observation and simple geometric 
reasoning, we formally deduce some key properties 
of the relationship between an informative pattern 
and its corresponding signature on the information 
map. 

Based on our formal analysis, we prove here that, 
for any searchlight radius, a single task-responsive 
voxel produces a larger signature on the informa- 
tion map as compared to a distributed multivoxel 
response pattern. Furthermore, the number of in- 
formative searchlights over the brain can increase as 
a function of searchlight radius, without necessar- 
ily revealing any new information and even in the 
complete absence of any multivariate response pat- 
terns. Importantly, these properties are largely inde- 
pendent of the type of machine-learning algorithm 
or the testing protocol used to compute the search- 
light statistic. 



2. Model 



2.2. Definition: The searchlight decomposition 

The basis of the searchlight analysis is the geomet- 
ric structure of the voxel-space in which the brain 
images are defined. The voxel-space V is defined 



here as the set of all voxels V augmented with a ge- 
ometric structure defining the relative spatial posi- 
tion of the voxels in V, and a distance measure be- 
tween these voxels. For analytical convenience, we 
treat the voxel-space as being uniform and connected 
as described below. 

A d-dimensional voxel-space V is deemed to be uni- 
form if every voxel has a neighboring voxel in all d 
principal directions. Additionally, we assume that 
the voxel-space V is connected. Specifically, there 
is a path connecting every pair of voxels V{ and vj 
in V with a path defined here to be an ordered se- 
quence of voxels (vi, • • • , Vk, Vk+i, - -"Vj) where voxel 
Vk+i is a neighbor of vj~ along one of the d princi- 
pal directions. These simplifying assumptions are 
intended to emphasize the general geometric princi- 
ples entailed by the searchlight method while delib- 
erately ignoring the special cases associated with (i) 
the boundaries of V where a searchlight may be trun- 
cated; and (ii) distinctions between gray-matter and 
white-matter voxels and any masking of the latter 
from the searchlights. Although we refer to search- 
lights as being volumes in a voxel-space having di- 
mensionality d = 3, the properties derived here are 
agnostic to the specific value of d and apply to sur- 
faces (d = 2) where the searchlights are discs (as in, 
Oosterhof eTaLl[20TT|[Chen et al4|20lT] ). 



The key abstraction defined by the searchlight 
method is a decomposition of V into subsets of 
voxels based on a geometric criterion. Given 
a voxel space V, we define a searchlight voxel- 
decomposition using the following indexing func- 
tion 

S:VxR^ V{V) (1) 

where V(V) is the powerset of V, namely, the set of 
all subsets of V. This indexing function S takes two 
inputs - the identity of a voxel v in the voxel-space 
V, and a real-value r E M specifying the search- 
light's radius. The searchlight indexing function 
uses these parameters in conjunction with the geo- 
metric structure of V to extract and output a set of 
voxels S(r,v) E V(V). A voxel v' E V is a mem- 
ber of S(r, v) if and only if the distance between v' 
and v is less than or equal to r. For convenience, 
we henceforth write S r (v) to denote the searchlight 
S(r,v). The resulting searchlight voxel-decomposition 
of V for a given radius r is defined as 



For clarity, we restrict our usage of the term "search- 
light " to the cases when the value of the radius of 
a searchlight r is such that each S r (V) is a multi- 
voxel entity that is not identical with V, that is, 1 < 
\S r (v)\ < |V|, for any S r (v) E S r (V). We refer to 
the univariate case where \S r (v)\ = 1 as the univoxel 
decomposition. 

A schematic of the searchlight indexing scheme is 
shown in Figure [lj 



2.2. Definition: Informativeness function 

The searchlight statistic is a measure of whether the 
voxels in the searchlight, as a unit, exhibit differ- 
ences in their conjoint responses to the experimen- 
tal conditions. More generally, it is a measure of 
whether the searchlight contains information about 
the experimental condition, i.e., whether the search- 
light is informative. As with the radius of the search- 
light, the specific statistical procedure used to com- 
pute the searchlight statistic is a discretionary choice 



made by the researcher (for example, see |Pereira and 
Botvinickt[20TT] >. 



S r (V) = {S r (v) | for all v E V} 



(2) 



To describe the searchlight statistic in a procedure- 
independent manner, we use a binary indicator 
function, which we refer to as the informativeness 
function, / : V(V) {0, 1}. Given a subset of voxels 
G E V{V), the function / returns a value of 1 if the 
responses of G are deemed to be informative; or if 
they are not, based on some appropriately specified 
statistical criterion. 

Evaluating the informativeness function on the re- 
sponses corresponding to each searchlight S r (v) in 
S r (V) defines the overall information set for a par- 
ticular radius 

I(5 r (V)) = {I(S r (v)) I for all S r (v) in S r (V)} (3) 

The information map is the object obtained when the 
information set defined above is augmented with the 
geometric structure of the voxel-space V by map- 
ping the informativeness value of each searchlight 
I(S r (v)) to its corresponding central voxel v. 

The performance measure of interest here is the total 
number of informative searchlights for a particular 
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Figure 1: Cartoon of searchlight indexing scheme: Left panel shows two example voxels v a and Vb that are mapped to searchlights 
S r (v a ) (orange) and S r (vb) (light gray) respectively, having some radius r, as shown in the right panel. Two searchlights can overlap 
to varying degrees depending on the distance between their center voxels and the radius. Here the overlap is indicated in dark- 
gray. The searchlight statistic computed on each searchlight is mapped back to the corresponding central voxels to generate the 
information map (see text). 



searchlight decomposition 

|v| 

Fr = Y,I{Sr{vi)) (4) 

2.3. Linking assumptions 

Two simple properties link the structure of the 
searchlight decomposition S r (V) to the structure of 
the information set I(<S r (V)). 

The first property is that, by virtue of the regularity 
of their shape and relative positioning, searchlights 
in <S r (V) can overlap. Consequently, the same voxels 
in V can be included or sampled by multiple search- 
lights. The second property is that since a sphere 
is an arbitrarily chosen and regular shape, it is un- 
likely that every voxel is necessarily task-relevant in 
every informative searchlight. Consequently, the in- 
formativeness of the responses in some particular 
searchlight volume S r (v) can be alternatively and 
accurately interpreted as indicating that some group 
of voxels in that searchlight volume exhibits task- 
dependent responses. These two properties can be 
combined as follows. Let G be a group of task- 
relevant voxels in a searchlight S r (v), where G C 
S r (v). Since searchlights share voxels, if some other 
searchlight S r (v f ) also contains G, that is G c S r (v'), 
then it implies that S r (v') should also contain task- 
relevant information as it includes the task-relevant 
voxels G. 



Based on this observation, we make two linking 
assumptions about the behavior of the procedures 
used to compute the searchlight statistic and hence 
the informativeness function /. The first is that we 
restrict the focus of our analysis to the common mul- 
tivariate procedure that does not include geomet- 
ric information about the relative spatial positions 
of the voxels in a searchlight while computing that 
searchlight's informativeness. The second is the Su- 
perset informativeness (SIN) assumption which postu- 
lates that: 

Superset informativeness assumption: If a group of 
voxels G is informative then every searchlight that con- 
tains G is also informative. 

That is, according to the SIN assumption, if 1(G) = 1 
then I(S r (v)) = 1 for all v E V where \G\ > and 
G C S r (v). Unless otherwise stated, we will over- 
load the symbol / to denote an informativeness func- 
tion that explicitly satisfies these two model require- 
ments. 

Although the SIN assumption is based is on a sound 
deduction, the empirical requirement that it poses 
may not necessarily be satisfied in practice. Specif- 
ically, even if it is known that 1(G) = 1, the sta- 
tistical procedure used to evaluate informativeness 
might fail to detect that a searchlight S r (v) is in- 
formative even if G C S r (v). Such a Type II er- 
ror (i.e., failing to reject a false null hypothesis that 
Hq : I(S r (v)) = 0) might occur for any of a va- 



riety of reasons, for example, the use of an inap- 
propriate machine-learning algorithm ( |Pereira and 



Botvinick} 2011[ ), insufficient power due to a limited 
number of samples, and so on. In this regard, the 
SIN assumption treats the multivoxel pattern anal- 
ysis techniques as being more sensitive and reliable 
than might actually be the case in practice. That is, 
the SIN assumption allows us to establish the infor- 
mation map's properties in the best-case indepen- 
dent of the performance idiosyncrasies of the spe- 
cific multivariate method being used. 



Proof. Consider a searchlight S r (v a ) centered at 
voxel v a . Since a searchlight is defined at every voxel 
in V (Equation |2]), it follows that there is a search- 
light defined at every voxel in S r (v a ). By definition, 
a voxel Vb E V is a member of S r (v a ) if and only if the 
distance between v a and is less than or equal to the 
radius r. Since there is a searchlight S r (vij) centered 
at Vb E S r (v a ), and the distance between v a and is 
less than or equal to r, it follows that v a is a member 
of searchlight S r (vb). Therefore, if is a member of 
S r (v a ) then v a is a member of S r (vb). □ 



3. Analytical results 

Our focus of the current section is to establish how 
the structure of the sampling bias arises from the 
searchlight decomposition. We first prove that due 
to the geometric regularities of a searchlight decom- 
position, single-voxels and multivoxel groups are 
sampled with different frequencies, i.e., included in 
a different number of searchlights. Specifically, sin- 
gle voxels are included in more searchlights than 
multivoxel groups. This sampling difference is in- 
dependent of the searchlight radius. We then ex- 
tend these results to prove that the frequency with 
which voxel-groups are sampled increases with the 
radius of the searchlights, irrespective of the number 
of voxels in the group. Finally, we prove that the in- 
formation map mirrors these sampling biases in an 
optimistic manner, i.e., in a manner that is not neces- 
sarily warranted by the data. 

3.1. Single-voxels and multivoxel-groups are sampled 
with different frequencies 

The regularity in the shape of the searchlights and 
their relative positions the voxel-space define a sys- 
tematic relationship between each voxel v E V and 
the searchlights in S r (V) that contain that voxel v. 
Firstly, if a voxel v a is a member of the searchlight 
S r (vb), then by symmetry, the voxel is a mem- 
ber of the searchlight S r (v a ) (Lemma [T]). Secondly, 
two distinct voxels v a and v\> are not simultaneously 
included in every searchlight that contains either of 
these voxels (Lemma [2]). 

Lemma 1. If a voxel is a member of S r {v a ) then the 
voxel v a is a member of S r (vb), where v a ,Vb £ V and 
S r (v a ),S r (v b ) E S r (V. 



Lemma 2. For any two non-identical voxels v a and 
where Va.v^ E V and S r (v a ) ^ 5 r (^), there necessar- 
ily exists a searchlight that contains v a but not v b and a 
different searchlight that contains v b but not v a . 

Proof. This claim can be proved in two steps based 
on the distance between v a and v^. 

First consider the case where the distance between 
v a and Vb is greater than 2r, that is, the diameter 
of a searchlight. By definition, a searchlight con- 
tains voxels that have a distance less than or equal 
to r from that searchlight's central voxel. Due to 
the spherical shape of the searchlight, the maximum 
distance between any two voxels in a searchlight is 
equal to 2r. If the distance between v a and is 
greater than 2r, there does not exist any searchlight 
of radius r that contains both v a and as members. 
Thus, it follows that there exists some searchlight 
that contains v a but not and some other search- 
light that contains but not v a . 

Now consider the second case where the distance be- 
tween v a and vt is less than or equal to 2r. Since the 
distance between these two voxels is less than the 
maximum distance between some two voxels in a 
searchlight, in a uniform voxel-space there necessar- 
ily exists some searchlight S r (v) that contains both 
v a and Vb as members. Contrary to the proposition, 
let us assume that both v a and are contained in ev- 
ery searchlight that contains either v a or v^. That is, 
if a searchlight S r (v) contains v a , then it necessarily 
contains v^, and vice versa. Recall that, from Lemma 
1, a voxel v a is contained in every searchlight S r (v) 
where v E S r (v a ). Now, based on the contradictory 
assumption, it implies that is also contained in 
every such searchlight S r (v) where v E S r (v a ). By 
the same reasoning, v a should be contained in every 



searchlight S r (v) where v E S r (vb). If these condi- 
tions hold true, then it implies that every voxel in 
S r (v a ) is also contained in SV(^); and every voxel in 
S r (vb) is also contained in S r (v a ). If this the case, 
then the searchlights S(v a ) and S(vb) are identical 
as they contain exactly the same voxels. This rela- 
tionship, however, contradicts the requirement that 
S r (va) 7^ S r (vi,). Thus, the assumption that v a and 
are both contained in every searchlight that contains 
either v a or cannot be true. 

Therefore, there necessarily exists a searchlight that 
contains v a that does not contain v\>, and some other 
searchlight that contains but not v a . □ 

Armed with the properties described by Lemmas [l] 
and |2| we can now numerically estimate the num- 
ber of searchlights that include a given individual 
voxel. 

Theorem 3. A voxel v is contained in exactly N r {v) dif- 
ferent searchlights, where N r (v) is the number of voxels 
contained in the searchlight S r (v). 

Proof From Lemma [TJ a voxel v is contained in each 
searchlight S r {v'), if and only if v' is a voxel in 
S r (v). Let N r (v) be the number of voxels in S r (v). 
Therefore, v is present in each of these N r (v) search- 
lights. □ 

For simplicity, we treat N r (v) as being the same 
for every searchlight, and write N r to indicate the 
canonical number of voxels contained in a spheri- 
cal volume of radius r, for a given resolution of the 
voxel-space. 

From Theorem [3} we see that the radius, a parameter 
chosen by the researcher, directly specifies how often 
information in a particular voxel is sampled by mul- 
tiple searchlights. For voxels of size 3mm x 3mm x 
3mm, the number of voxels contained in searchlights 
of different radii are shown in Figure |2j As can be 
seen, the number of voxels in a searchlight, that is 
N r , grows rapidly with the radius of the searchlight 
r, and consequently so do the number of searchlights 
that include a particular voxel. 

Since searchlights are intended to identify multi- 
voxel response patterns, we extend the single-voxel 
property in Theorem [3] to quantify the membership 
of a group of multiple voxels placing no constraint 
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Figure 2: Number of voxels in a searchlight volume (N r ) as a 
function of searchlight radius (r) in a 3x3x3 mm 3 voxel-space. 

on the relative spatial locations of the voxels in the 
group. 

Theorem 4. A group of voxels G containing more than 
one voxel is contained in strictly less than N r search- 
lights. 

Proof. A voxel is contained in N r searchlights, from 
Theorem [3j Consequently, every voxel in G is each 
contained in N r searchlights. From Lemma|2j for any 
two voxels v a and v^, there is necessarily a search- 
light that contains v a and not v^, and vice versa. 
Therefore, of the 7V r searchlights containing v a , there 
necessarily exists at least one searchlight that con- 
tains v a but not vi,. Thus, the number of searchlights 
that simultaneously contain both v a and must be 
less than N r . Since G contains multiple voxels, any 
pair of voxels in G must be simultaneously con- 
tained in less than N r searchlights. Therefore, all the 
voxels in G cannot be simultaneously contained in 
iV r searchlights, and G must be contained in strictly 
less than N r searchlights. □ 

3.2. The sampling frequency of voxel(s) increases with 
searchlight radius 

Although single-voxels and multivoxel groups are 
included in different numbers of searchlights for any 
radius r, we now show that the absolute number 
of searchlights that include either a single-voxel or 



a multivoxel group increases with the radius of the 
searchlight. 

Lemma 5. A searchlight of radius r a is fully contained 
in more than one searchlight of radius r&, where > r a 
and S ra (v) c S n (v) for all v E V. 

Proof Consider two searchlights centered at the 
same voxel v - one that has a radius r a , and the 
other having radius r&. By definition, since S Ta (v) C 
S n (v), all the voxels in S Ta (v) are members of S n (v), 
and there exists at least one voxel in S n (v) that is not 
in S ra (v). 

Now, consider the searchlight S z {v), having radius 
z = n — r a . Due to the spherical shape of search- 
lights, the maximum distance between a voxel v" in 
S z (v) and some voxel v' in S Ta (v) is equal to r a + z = 
r&. Therefore, all other voxels in S ra (v) must have 
distances less than or equal to r&. 

Since the distance between these two maximally dis- 
tant voxels v' and v" is equal to r&, the voxel v' must 
be contained in a searchlight of radius that is cen- 
tered at v" , namely, S rb (v"). Since all other vox- 
els in S ra (v) have a distance less than or equal to 
r\) from v" , it follows that every voxel in S Ta {v) is 
also contained in the searchlight S n (v"). Thus every 
voxel in S Ta (v) is contained in at least two search- 
lights having radius r&, namely, S n (v) and S rb (v"). 
Therefore, a searchlight S Ta (v) is contained in more 
than one searchlight of radius r&, where > r a and 
S ra (v) c S rb (v). □ 

Using Lemma |5} we can now prove a general scaling 
property Irrespective of the size of a voxel group, 
the frequency with which it is sampled by different 
searchlights increases with the radius of the search- 
light - a property that we prove next. 

Theorem 6. A group of voxels G is contained in more 
searchlights of radius than searchlights of radius r a , 
where G C S Ta , n > r a and S ra (v) c S n (v), for all 
veV. 

Proof Let K Ta and K n be the number of searchlights 
of radius r a and that contain G. 

Since S Ta (v) c S n (v) for every v e V, it follows, by 
transitivity, that if G C S Ta (vi) for some voxel Vi E V, 
then G C S n (vi). Therefore, the number of search- 



lights of radius that contain G cannot be strictly 
less than that for r a , that is, K n <fi K Ta . 

By the transitivity of the subset relation, if G C 
5 ra (^) and S ra (vi) C S f rb (^-) for some v h Vj e V, 
then it follows that G c S n (vj). From Lemma [5} 
a searchlight of radius r a is contained in multiple 
searchlights of radius where > r a and S ra (v) C 
S n (v) (for all v E V). Since there is more than one 
searchlight of radius containing S ra {vj), for every 
searchlight for which G C S ra (vi) holds true, it im- 
plies that K n > K ra . 

From Theorems [3] and |4| the number of searchlights 
of radius r a that can contain G is less than or equal 
to N Ta . Consequently, in a uniform and connected 
voxel-space, it follows that there exist two adjacent 
voxels Vi and Vj in V such that G is a subset of 
S ra (vi) but is not a subset of S ra (vj). From Lemma 
[5j searchlights of radius centered at voxels within 
t~6 - r a from V{ fully contain all voxels in S ra (vi). 
Since S ra (v) C S rb (v), it implies that the distance of 
vj to vi is less than or equal to n — r a . Therefore, 
S ra (vi) C S n (vj) and consequently G c S n (vj). 
Since G 5 ra (i;j) and G c S rb (vj), it implies that 
there exists at least one voxel at which a searchlight 
of radius contains G, but where a searchlight of ra- 
dius r a does not contain G. Consequently, K r6 must 
be strictly greater than K Ta , that is, the group of vox- 
els G is contained in more searchlights of radius 
thanr a . □ 

Theorem [6] above establishes that the number of 
searchlights that include either a voxel or group of 
voxels increases monotonically with the radius of 
the searchlight. How then does this scaling of the 
sampling bias influence the properties of the infor- 
mation map? 

3.3. An optimistic bias in the information map 

Recall that F r (Equation [4]) is an index of the sen- 
sitivity of the searchlight method in detecting mul- 
tivoxel response patterns, and is equal to the to- 
tal number of informative searchlights with a par- 
ticular search decomposition. We now prove that 
as a direct consequence of how the sampling bias 
scales with the searchlight radius, the value of F r 
also increases strictly monotonically with increasing 
searchlight radius. 



Theorem 7. For two searchlight radii, r a and r&, where 
n > r a and S ra (v) c SV 6 (v) /or every v <E V, if < 
F ra <VthenF n > F ra . 

Proof. Since S ra (v) C S rb (v) for every v e V, by the 
SIN assumption, it follows that if I(S ra (v)) = 1 then 
I(S n (v)) = 1, for any voxel -u G V. Therefore, the 
number of informative searchlights of radius can- 
not be strictly less than that for r a , that is, F n <fi F Ta , 
for any value of F Ta . 

From Lemma |5} a searchlight of radius r a is con- 
tained in multiple searchlights of radius where 
7*5 > r a and S ra (v) C S rb (v) (for all v E V). For every 
searchlight for which I(S ra (v)) = 1, there is more 
than one searchlight of radius containing S ra (v). 
Since each informative searchlight of radius r a is a 
subset of multiple searchlights of radius r&, by the 
SIN assumption, it implies that F n > F Ta . 

Let < F ra < V. Since F Ta < V, there neces- 
sarily exist two adjacent voxels v\ and vj such that 
I ( s r a (vi)) = 1 and I(S ra (vj)) = 0. By the same 
logic used to prove Theorem |6} searchlights of ra- 
dius n centered at voxels within — r a from vi fully 
contain all voxels in S ra (vi). Consequently, by the 
SIN assumption, I(S rb (vj)) = 1. This implies that 
a searchlight centered at voxel vj is informative if it 
has a radius but not if it has a radius r a . Therefore 
F rb >F ra . □ 

What does Theorem [7] have to do with optimism? 
The monotonic increases in the number of informa- 
tive searchlights is due to increases in the sampling 
bias, which in turn is due to the use of a multivoxel 
searchlight. Specifically, it is possible to obtain an 
increased "sensitivity" of the information map sim- 
ply by increasing the radius of the multivoxel search- 
lights, with no reference to the statistical properties 
of the voxel-responses, i.e., whether they in fact ex- 
hibit multivariate response differences. 

4. An illustration 

In this section, we present simulations to provide 
a concrete intuition for the analytical results above, 
and their implications. For ease of demonstration, 
the voxel-space V for all simulations consisted of a 
single axial slice having two principal directions. All 
the voxels in this voxel-space were populated with 



simulated response information from two fictitious 
experimental conditions A and B. These simulated 
data were subjected to the searchlight-procedure to 
produce information maps. The radius r of the 
searchlights used for the searchlight decomposition 
S r (V) was varied systematically to produce a cor- 
responding information map for each radius value. 
The radius took the values: 4 mm, 6 mm, 8 mm, 10 
mm and 12 mm, corresponding to searchlights con- 
taining 5 voxels, 13 voxels, 21 voxels, 37 voxels and 
49 voxels respectively. 

The simulated response-data differed in the num- 
ber and relative spatial location of the voxels that 
were responsive to the experimental conditions. In 
the first of these simulations discussed next, a single 
voxel contained task-relevant information while all 
the remaining voxels did not. 

4.1. The needle-in-the-haystack effect 

Suppose there exists some voxel in V, say vo, that ex- 
hibits a response difference to the experimental con- 
ditions such that the informativeness function iden- 
tifies vo as being task-relevant, that is, I(vo) = 1. 
Since I(vo) = 1, by the SIN assumption it follows 
that each of the searchlights that contain v should 
also be deemed to be informative as well. Recall 
that, according to Theorem [3} each voxel v in V is 
contained in exactly N r searchlights where N r = 
\S r (v)\. It then follows that the signal-carrying voxel 
vo should be contained in 7V r searchlights, each be- 
ing centered at a voxel in S r (vo). Thus, a single 
signal-carrying voxel (a "needle") should produce a 
cluster having N r voxels on the information map (a 
"haystack"). 

To simulate this "needle-in-the-haystack" effect, the 
task-relevant responses of vo in conditions A and 
B took the form illustrated in Figure [3ja). The re- 
sponses to both conditions were drawn randomly 
from a normal distribution with standard deviation 
a = 1. The voxel ^o's mean response to condition 
A was ii a = +2; and iib = — 2 for condition B. 
The responses of all other (non task-relevant) voxels 
were drawn from normal distributions having a = 1 
where ha = = 0. To maximize the sensitivity 
of the searchlight statistic and emulate the require- 
ments of the SIN assumption, a total of 300 samples 
were drawn for each condition. The spatial position 
of voxel vo is shown in blue in Figure [3pb). The voxel 
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Figure 3: Single-voxel response: (a) Scatter plot of simulated responses of voxel vo to conditions A and B. A total of 300 samples 
were drawn per condition (see text), (b) Spatial location of voxel vo (indicated by blue square) in the simulated single-slice voxel 
space. 



was placed far from the boundaries of the slice to 
avoid truncations of the searchlights and to emulate 
a uniform voxel-space in the vicinity of v$. 

With this setup, the searchlight decomposition 
and testing procedure was implemented using the 
PyMVPA toolbox ( |Hanke et aL}[2009| ). Each search- 
light's informativeness was determined by evaluat- 
ing the decodability of its responses, i.e., testing for 
the existence of a model that accurately classifies a 
sample's membership in each condition based on 



the searchlight's responses ( |Pereira and Botvinick 



2011} |Pereira et alj |2QQ9| ). Decodability was tested 
using a linear Support Vector Machine (SVM) with 
a soft-margin regularization parameter, (7 = 1. The 
searchlight statistic was the mean classification accu- 
racy obtained using a Leave-One-Out (LOO) cross- 
validation procedure. 

Figure |3Ja) shows the information maps obtained 
(thresholded at 60%). In the upper-panel, going from 
the left to the right in order of increasing radius, we 
see that there is a single high accuracy cluster (red- 
colored voxels) centered at the signal-carrying voxel 
vo, and this cluster grows in size with increasing 
radius. The lower-panel shows an expanded view 
of this high accuracy cluster, thresholded at 80%. 
Consistent with the predictions described above, for 
each radius, the size and shape of these clusters on 
the information map correspond exactly to the size, 
shape and location of the searchlight S r (vo) centered 
at voxel vq. Furthermore, consistent with Theorem 



[7} the number of informative searchlights identified 
(F r ) increases in a monotonic manner with the ra- 
dius of the searchlight, even though there is no dif- 
ference in the actual information present or even any 
multivoxel response patterns to speak of. 

Figure |4jb) shows the values on the information 
map from a single ID segment running horizontally 
through the voxel vq through the diameter of the 
searchlights centered at vq. The voxel vo is assigned 
a value 0. Consistent with the SIN assumption, the 
accuracies on the information map do not exhibit 
a smooth degradation as a function of the distance 
from vq. Critically, this pedestal-like profile is unlike 
the profile that would be expected if the searchlights 
were the equivalent of a "spatial smoothing" kernel 
on the information map. 

What is the comparable effect on the information 
map when the task-relevant signal is distributed 
over multiple voxels? We next consider this sce- 
nario. 



4.2. The haystack-in-the-needle effect 

Suppose there are two voxels, v\ and V2 in V, that 
conjointly exhibit a response difference to the ex- 
perimental conditions. However, neither voxel by 
itself shows a task-relevant difference. That is, 
I({v u v 2 }) = 1 and I(vi) = I(v 2 ) = 0. By the SIN 
assumption, every searchlight that contains both v\ 
and V2 should be informative, but searchlights that 
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Figure 4: Information maps for single-voxel response: (a) Upper panel shows the information-maps across the entire slice as a 
function of increasing searchlight radius going from left to right (threshold = 60% accuracy). Lower panel shows the corresponding 
expanded view of the high-accuracy clusters (threshold = 80% accuracy) for each searchlight radius value. Horizontal lines are 
to provide a common reference to compare relative sizes of clusters, (b) Cross-sectional profile of the information map showing 
classification accuracies for searchlights centered at voxels on a horizontal ID slice through the voxel vo. The dotted horizontal line 
indicates the thresholding value of 80%. Only the profiles for radii 4mm, 8mm and 12mm are shown. 



contain either v\ or V2 alone would not necessarily be 
informative. Recall that, according to Theorem |4| a 
group of multiple voxels (i.e., having more than one 
voxel) is contained in strictly less than N r search- 
lights. It then follows that the signal-carrying voxel 
group {vi, V2} should produce a cluster having less 
than N r voxels on the information map, i.e., a mul- 
tivoxel "haystack" should produce a "needles-like 
cluster, unlike the needle-in-the-haystack scenario in 
Section 14.11 above . 

To simulate this "haystack-in-the-needle" effect, the 
task-relevant responses in the two voxels v± and V2 
took the form shown in Figure [S^a). The responses to 
each condition were drawn randomly from a normal 
distribution having standard deviation a = 1. Each 
voxel's mean response to conditions A and B are 
shown as dotted lines. The voxel v\ had an identical 
mean response to both conditions A and B, specif- 
ically, 11 a = = (the horizontal dotted line); 
while voxel v^s mean response to condition A was 
\±A = +0.5 and to condition B was \±b = -0.5 (in- 
dicated by each of the dotted vertical lines). Impor- 
tantly, the responses of voxel v± and V2 to both con- 
ditions were correlated negatively. The response of 
voxel vi on condition A, denoted as X^a was equal 
to -X 2 ,A/ the response of voxel V2 to condition A. 



Similarly, for condition B, X\^b = — ^2,£- The simu- 
lated responses of all other voxels were drawn from 
distributions having a = 1 and ijl a = = 0, and 
were uncorrelated with the responses in either voxel 
vi or V2- As with the previous simulation above, a 
total of 300 samples were drawn for each condition. 
With signals of this form, the conjoint responses of 
voxels v\ and V2 to conditions A and B are linearly 
separable (see Figure [5^a)). However, A and B can- 
not be distinguished from the responses in v\, but 
should be weakly discriminable from the responses 
in v 2 . 

The relative spatial positions of v\ and V2, indicated 
as blue squares, are shown in Figure [5pb). We con- 
sidered two cases, where v\ and V2 were separated 
by 2 voxels in one case; and by 3 voxels in the other. 
When vi and V2 have a separation of 2 voxels, there 
is no one searchlight of radius 4mm that can con- 
tain both of these voxels. With a separation of 3 
voxels, there are no searchlights of radius 4 mm, 6 
mm, or 8 mm that can contain both v\ and V2. With 
this setup, the searchlight decomposition and testing 
procedure was simulated in the same manner as in 
Section El 

Figure [6] shows the portions of the information maps 
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Figure 5: Multivoxel response: (a) Scatter plot of simulated responses of voxels v\ and V2 to conditions A (gray squares) and 
B (black dots). Dotted lines indicate the mean response of each voxel alone to A and B. (b) Spatial location of voxels v\ and v<i 
(indicated by blue squares) in the simulated single-slice voxel space. The separation between v\ and V2 was either 2 or 3 voxels (see 
text). The responses of voxel V2 show a weak response difference to the conditions, indicated by the dotted circle. 



in the vicinity of voxels v\ and V2 (thresholded 
at 60%). In all the information maps, the above- 
threshold cluster takes the size and shape of the cor- 
responding searchlight and is centered at voxel V2, 
namely, the voxel exhibiting a weak response dif- 
ference to conditions A and B. This "needle-in-the- 
haystack" organization is consistent with the simu- 
lations in Section |4~T1 and is invariant to the number 
of voxels separating v\ and V2. 

Now, observe that the clusters in several, but not all, 
of the information maps contain sub-clusters con- 
sisting of voxels having high classification accuracies 
(indicated in red). These voxels on the information 
map correspond to the centers of searchlights that 
contain both v\ and V2. As required by Theorem |4} 
for each radius, the number of high-accuracy vox- 
els in the cluster are less than N r . Due to the geo- 
metric constraint defined by the separation between 
v± and V2, the presence of any high-accuracy voxels 
at all in an information map depends on the radius 
of the searchlights used. For example, information 
maps obtained with searchlights of radius 4 mm do 
not contain any high-accuracy voxels for both sep- 
arations (top row), while the information maps for 
searchlights of radius 8mm contain high-accuracy 
voxels for the 2 voxel separation but not for the 3 
voxel separation. 

Figures |6ja) and (b) show the ID cross-section of 
the information map through the horizontal diame- 
ter of the clusters for the 2 voxel and 3 voxel separa- 



tions respectively. As evident, there is a "smearing ,/ , 
rather than smoothing, of the accuracies with grow- 
ing radius values, as in Figure |4pb). Furthermore, 
when a searchlight is large enough to include both 
v\ and V2, there is a large increase in the classifica- 
tion accuracy. 

The above simulations confirm the basic statisti- 
cal premise motivating the searchlight-procedure, 
namely, the ability of a multivoxel pattern anal- 
ysis method to detect distributed response pat- 
terns. However, for any radius, the size of the 
clusters produced by multivoxel response patterns 
are smaller than those produced by single voxel 
response-differences. Consistent with Theorem [7} 
the number of informative searchlights identified in- 
creases in a monotonic manner with the radius of the 
searchlight. 



4.3. Whole-brain inflation maps 

The previous two simulations demonstrated signal- 
dependent effects caused by the sampling bias inher- 
ent in the searchlight decomposition. However, ac- 
cording to Theorem |7| there should be a monotonic 
increase in the number of informative searchlights as 
a function of radius, irrespective of the actual distri- 
bution of task-relevant voxels /voxel-groups across 
the brain. This monotonic scaling of the size of the 
"blobs" on the information map makes plausible a 
rather unusual scenario - an information map where 
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Figure 6: Information maps for multi voxel response: (a) Left and right panels show portions of information map (threshold = 
60% accuracy) centered around voxels v\ and V2 as a function of increasing searchlight radius (top to bottom); and the separation 
between v\ and V2 - 2 voxel separation (left panel), and 3 voxel separation (right panel). Plots (b) and (c): Cross-sectional profile of 
the information map showing classification accuracies for searchlights centered at voxels on a horizontal ID slice through the voxels 
v\ and V2 for separation of 2 voxels (b), and 3 voxels (c). The lower dotted horizontal line indicates the thresholding value of 60%, 
and the upper dotted line indicates the regime of the high-accuracy searchlights (80%). 



every searchlight in the brain is deemed to be infor- 
mative. 

This scenario was motivated by results recently re- 
ported by |Poldrack et al.| ( |2009| ). In that study, in- 
formation maps were generated using searchlights 
of radius 4 mm and 8 mm. Rather remarkably, with 
a radius of 8 mm, only one region in the informa- 
tion map (the bilateral dorsolateral prefrontal cor- 
tex) was found to be uninformative while every other 
searchlight was informative. This whole-brain cov- 
erage was, however, not the case with the 4 mm 
searchlights. Given the inflationary relationship be- 
tween F r (the number of informative searchlights) 
and searchlight radius r that established in the pre- 
vious sections, curiosity asked: could an informative 
whole-brain arise (i.e., F r = |V|) by random chance 
with a suitably chosen searchlight radius? 

This question can be formulated as a covering prob- 
lem. Consider a finite 3D voxel space correspond- 
ing to one containing the brain, approximated as a 
cubic volume of size Nx x Ny x Nz, where Ni is 
the number of voxels along the principal direction i. 
Suppose there is a minimum covering set of search- 
lights C r c S r (V) such that every voxel in V is con- 
tained in some searchlight in C r . Recall that a single- 
voxel signal can produce a cluster having N r voxels 
on the information map, due to Theorem [3] If the 
central voxel of each of the searchlights in C r was in- 
formative, it would follow that searchlights centered 
at every voxel in every one of the searchlights in C r 
would also be informative. Since every voxel in V 
is present in some searchlight in C r , it implies that 
a rather sparse distribution of informative single- 
voxels specified by C r could produce an information 
map where every searchlight in S r (V) would be in- 
formative (with the proviso that the SIN assumption 
holds true.) 

The sparsity of these informative single-voxels can 
be readily approximated if we use cubical volumes 
as a proxy for the spherical shape of the searchlight 
volumes. A cube of side w voxels would fully con- 
tain a sphere having radius r = w/2, and would be 
fully contained in a sphere of radius wy/3/2. With 
this simplification, the minimal number of search- 
light cubes required to cover the voxel space V is 
readily approximated as the volume of the voxel 
space divided by the volume of each searchlight 
cube, that is, « \(N x N Y N z )/(w 3 )]. 
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Figure 7: Number of non-intersecting cubes required to fill a 
volume of dimensions N x — 52, N y = 63, N z — 45 (Ni = num- 
ber of voxels in principal direction i) as a function of cube-size. 
Values next to each data-point indicate the actual number of fill- 
ing cubes for each cube-size. 



For voxels of size 3x3x3 mm 3 , we approximate 
the size of the voxel space with the following val- 
ues Nx = 52 voxels, Ny = 63 voxels and Nz = 45 
voxels. Figure [7] shows the minimum number of cu- 
bical volumes required to cover V as a function of w, 
where w took values 1, 3, 5, 7, 9, 11. 

A searchlight cube of side w = 1 is equivalent to a 
single voxel so the size of the covering set C r is equal 
to the total number of voxels in V, namely, 147420. 
However, increasing values of w produce a rapid 
decrease in the size of the covering set. For w = 3 
voxels, a cubical volume that would fully contain a 
spherical searchlight of radius 4 mm, a total of 5460 
equally spaced signal-carrying voxels can produce 
an information map where every searchlight is in- 
formative. However, for a cubical volume with side 
w = 7 voxels, corresponding to spherical volumes 
of radius 8mm, a mere 430 voxels are required for 
such a fully informative map. Stated differently, an 
information map with a single task-relevant cluster 
made up of every voxel in V can be generated from a 
mere 430 regularly spaced voxels of the 147, 420 vox- 
els in V, that is, 430 voxels that enable the conditions 
to be distinguished whether due to the presence of 
true signal or by random chance. This potential 
for a small number of single voxels (i.e., « 0.003% 



of |V|) to drive the structure of the entire informa- 
tion map simply by the choice of the searchlight ra- 
dius presents an important consideration for draw- 
ing neurobiological interpretation. 



it is maximal when explicitly assuming a highly sen- 
sitive and robust MVPA technique, namely one sat- 
isfying the superset informativeness (SIN) assump- 
tion. 



5. Discussion 
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Knowledge of the actual information-carrying vox- 
els in each informative searchlight would make the 
information map irrelevant. These actual informa- 
tive voxels could be directly reported, hence resolv- 
ing the over-counting that arises from their inclusion 
in multiple searchlights. One possible implementa- 
tion would be to identify task-relevant voxels in each 
searchlight, and then combine these identified vox- 
els across searchlights. However, requiring the iden- 
tification of the actual informative voxels in each 
searchlight could reduce the generality of the search- 
light method. When pattern classifiers are used to 
compute the searchlight statistic, each voxel (or fea- 
ture) in a searchlight is typically assigned a weight, 
and the weighted combination of the multivoxel re- 
sponses is used to make a classification decision. 
However, the specific basis for assigning weights to 
individual features is highly dependent on the spe- 
cific machine learning algorithm and its inductive 
assumption s {Mitchell} |1980{ |Wolpert[ |1996{ |Guyon 
et al.[ 12002} |Pereira et al.[ |2009| ). Consequently, ap- 
propriate techniques would be required to allow re- 
sults to be compared across studies that use different 
M VPA-techniques . 

Until such advances are made, the analytical frame- 
work described above provides several constraints 
on alternate interpretations of the information map. 
Our results present a strong argument against mea- 
suring the sensitivity of information mapping by a 
count of the number of informative searchlights. The 
seemingly high sensitivity of the searchlight method 
as judged by such a performance measure in part has 
a rather trivial explanation. Specifically, an expla- 
nation in the obligatory geometric properties of the 
searchlight-method as discussed above rather than 
an explanation related to underlying neural orga- 
nization, or the sophisticated machine-learning al- 
gorithms used to analyze multivoxel response pat- 
terns, or the widely discussed merits of multivariate 
statistical evaluations. Indeed, the upshot of the op- 
timistic scaling of this performance measure is that 
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